| SAKTHIVEL G | 2022AC05688 |
| ADITYA GAURAV | 2022ac05101 |
| BHUVANJEET SINGH GANDHI | 2022ac05606 |
| KALRA GEETANSH | 2022ac05174 |
No contribution from Harish Bhagwan Raggal and Neeraj Gupta (tried contacting, no response also)
Group 18 - MLOps Assignment 2
The command installs three key libraries necessary for our assignment:
We are using the auto-sklearn Docker image from the official site to run this notebook, which already includes most of the dependencies on auto-sklearn.
! pip install dataprep shap lime
Requirement already satisfied: dataprep in /usr/local/lib/python3.8/dist-packages (0.4.5) Requirement already satisfied: shap in /usr/local/lib/python3.8/dist-packages (0.44.1) Requirement already satisfied: lime in /usr/local/lib/python3.8/dist-packages (0.2.0.1) Requirement already satisfied: pydot<2.0.0,>=1.4.2 in /usr/local/lib/python3.8/dist-packages (from dataprep) (1.4.2) Requirement already satisfied: sqlalchemy==1.3.24 in /usr/local/lib/python3.8/dist-packages (from dataprep) (1.3.24) Requirement already satisfied: rapidfuzz<3.0.0,>=2.1.2 in /usr/local/lib/python3.8/dist-packages (from dataprep) (2.15.2) Requirement already satisfied: aiohttp<4.0,>=3.6 in /usr/local/lib/python3.8/dist-packages (from dataprep) (3.10.5) Requirement already satisfied: tqdm<5.0,>=4.48 in /usr/local/lib/python3.8/dist-packages (from dataprep) (4.66.5) Requirement already satisfied: python-crfsuite==0.9.8 in /usr/local/lib/python3.8/dist-packages (from dataprep) (0.9.8) Requirement already satisfied: dask[array,dataframe,delayed]>=2022.3.0 in /usr/local/lib/python3.8/dist-packages (from dataprep) (2022.9.1) Requirement already satisfied: regex<2022.0.0,>=2021.8.3 in /usr/local/lib/python3.8/dist-packages (from dataprep) (2021.11.10) Requirement already satisfied: pandas<2.0,>=1.1 in /usr/local/lib/python3.8/dist-packages (from dataprep) (1.5.0) Requirement already satisfied: pydantic<2.0,>=1.6 in /usr/local/lib/python3.8/dist-packages (from dataprep) (1.10.18) Requirement already satisfied: ipywidgets<8.0,>=7.5 in /usr/local/lib/python3.8/dist-packages (from dataprep) (7.8.4) Requirement already satisfied: jsonpath-ng<2.0,>=1.5 in /usr/local/lib/python3.8/dist-packages (from dataprep) (1.6.1) Requirement already satisfied: bokeh<3,>=2 in /usr/local/lib/python3.8/dist-packages (from dataprep) (2.4.3) Requirement already satisfied: scipy<2.0,>=1.8 in /usr/local/lib/python3.8/dist-packages (from dataprep) (1.9.1) Requirement already satisfied: jinja2<3.1,>=3.0 in /usr/local/lib/python3.8/dist-packages (from dataprep) (3.0.3) Requirement already satisfied: nltk<4.0.0,>=3.6.7 in /usr/local/lib/python3.8/dist-packages (from dataprep) (3.9.1) Requirement already satisfied: numpy<2.0,>=1.21 in /usr/local/lib/python3.8/dist-packages (from dataprep) (1.23.3) Requirement already satisfied: wordcloud<2.0,>=1.8 in /usr/local/lib/python3.8/dist-packages (from dataprep) (1.9.3) Requirement already satisfied: python-stdnum<2.0,>=1.16 in /usr/local/lib/python3.8/dist-packages (from dataprep) (1.20) Requirement already satisfied: flask_cors<4.0.0,>=3.0.10 in /usr/local/lib/python3.8/dist-packages (from dataprep) (3.0.10) Requirement already satisfied: varname<0.9.0,>=0.8.1 in /usr/local/lib/python3.8/dist-packages (from dataprep) (0.8.3) Requirement already satisfied: metaphone<0.7,>=0.6 in /usr/local/lib/python3.8/dist-packages (from dataprep) (0.6) Requirement already satisfied: flask<3,>=2 in /usr/local/lib/python3.8/dist-packages (from dataprep) (2.2.5) Requirement already satisfied: slicer==0.0.7 in /usr/local/lib/python3.8/dist-packages (from shap) (0.0.7) Requirement already satisfied: packaging>20.9 in /usr/local/lib/python3.8/dist-packages (from shap) (21.3) Requirement already satisfied: scikit-learn in /usr/local/lib/python3.8/dist-packages (from shap) (0.24.2) Requirement already satisfied: cloudpickle in /usr/local/lib/python3.8/dist-packages (from shap) (2.2.0) Requirement already satisfied: numba in /usr/local/lib/python3.8/dist-packages (from shap) (0.58.1) Requirement already satisfied: matplotlib in /usr/local/lib/python3.8/dist-packages (from lime) (3.6.0) Requirement already satisfied: scikit-image>=0.12 in /usr/local/lib/python3.8/dist-packages (from lime) (0.21.0) Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.8/dist-packages (from aiohttp<4.0,>=3.6->dataprep) (1.3.1) Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp<4.0,>=3.6->dataprep) (1.11.1) Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp<4.0,>=3.6->dataprep) (22.1.0) Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp<4.0,>=3.6->dataprep) (4.0.3) Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.8/dist-packages (from aiohttp<4.0,>=3.6->dataprep) (6.1.0) Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp<4.0,>=3.6->dataprep) (2.4.0) Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from aiohttp<4.0,>=3.6->dataprep) (1.4.1) Requirement already satisfied: PyYAML>=3.10 in /usr/local/lib/python3.8/dist-packages (from bokeh<3,>=2->dataprep) (6.0) Requirement already satisfied: tornado>=5.1 in /usr/local/lib/python3.8/dist-packages (from bokeh<3,>=2->dataprep) (6.1) Requirement already satisfied: pillow>=7.1.0 in /usr/local/lib/python3.8/dist-packages (from bokeh<3,>=2->dataprep) (9.2.0) Requirement already satisfied: typing-extensions>=3.10.0 in /usr/local/lib/python3.8/dist-packages (from bokeh<3,>=2->dataprep) (4.3.0) Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask[array,dataframe,delayed]>=2022.3.0->dataprep) (2022.8.2) Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask[array,dataframe,delayed]>=2022.3.0->dataprep) (0.12.0) Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask[array,dataframe,delayed]>=2022.3.0->dataprep) (1.3.0) Requirement already satisfied: Werkzeug>=2.2.2 in /usr/local/lib/python3.8/dist-packages (from flask<3,>=2->dataprep) (3.0.4) Requirement already satisfied: itsdangerous>=2.0 in /usr/local/lib/python3.8/dist-packages (from flask<3,>=2->dataprep) (2.2.0) Requirement already satisfied: importlib-metadata>=3.6.0 in /usr/local/lib/python3.8/dist-packages (from flask<3,>=2->dataprep) (4.12.0) Requirement already satisfied: click>=8.0 in /usr/local/lib/python3.8/dist-packages (from flask<3,>=2->dataprep) (8.1.3) Requirement already satisfied: Six in /usr/local/lib/python3.8/dist-packages (from flask_cors<4.0.0,>=3.0.10->dataprep) (1.16.0) Requirement already satisfied: ipython>=4.0.0 in /usr/local/lib/python3.8/dist-packages (from ipywidgets<8.0,>=7.5->dataprep) (8.5.0) Requirement already satisfied: ipython-genutils~=0.2.0 in /usr/local/lib/python3.8/dist-packages (from ipywidgets<8.0,>=7.5->dataprep) (0.2.0) Requirement already satisfied: comm>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from ipywidgets<8.0,>=7.5->dataprep) (0.2.2) Requirement already satisfied: jupyterlab-widgets<3,>=1.0.0 in /usr/local/lib/python3.8/dist-packages (from ipywidgets<8.0,>=7.5->dataprep) (1.1.10) Requirement already satisfied: traitlets>=4.3.1 in /usr/local/lib/python3.8/dist-packages (from ipywidgets<8.0,>=7.5->dataprep) (5.4.0) Requirement already satisfied: widgetsnbextension~=3.6.9 in /usr/local/lib/python3.8/dist-packages (from ipywidgets<8.0,>=7.5->dataprep) (3.6.9) Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2<3.1,>=3.0->dataprep) (2.1.1) Requirement already satisfied: ply in /usr/local/lib/python3.8/dist-packages (from jsonpath-ng<2.0,>=1.5->dataprep) (3.11) Requirement already satisfied: joblib in /usr/local/lib/python3.8/dist-packages (from nltk<4.0.0,>=3.6.7->dataprep) (1.2.0) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging>20.9->shap) (3.0.9) Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.8/dist-packages (from pandas<2.0,>=1.1->dataprep) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.8/dist-packages (from pandas<2.0,>=1.1->dataprep) (2022.2.1) Requirement already satisfied: tifffile>=2022.8.12 in /usr/local/lib/python3.8/dist-packages (from scikit-image>=0.12->lime) (2023.7.10) Requirement already satisfied: networkx>=2.8 in /usr/local/lib/python3.8/dist-packages (from scikit-image>=0.12->lime) (3.1) Requirement already satisfied: imageio>=2.27 in /usr/local/lib/python3.8/dist-packages (from scikit-image>=0.12->lime) (2.35.1) Requirement already satisfied: PyWavelets>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from scikit-image>=0.12->lime) (1.4.1) Requirement already satisfied: lazy_loader>=0.2 in /usr/local/lib/python3.8/dist-packages (from scikit-image>=0.12->lime) (0.4) Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from scikit-learn->shap) (3.1.0) Requirement already satisfied: pure_eval<1.0.0 in /usr/local/lib/python3.8/dist-packages (from varname<0.9.0,>=0.8.1->dataprep) (0.2.2) Requirement already satisfied: executing<0.9.0,>=0.8.3 in /usr/local/lib/python3.8/dist-packages (from varname<0.9.0,>=0.8.1->dataprep) (0.8.3) Requirement already satisfied: asttokens<3.0.0,>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from varname<0.9.0,>=0.8.1->dataprep) (2.0.8) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.8/dist-packages (from matplotlib->lime) (0.11.0) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib->lime) (1.4.4) Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.8/dist-packages (from matplotlib->lime) (4.37.2) Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib->lime) (1.0.5) Requirement already satisfied: llvmlite<0.42,>=0.41.0dev0 in /usr/local/lib/python3.8/dist-packages (from numba->shap) (0.41.1) Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata>=3.6.0->flask<3,>=2->dataprep) (3.8.1) Requirement already satisfied: prompt-toolkit<3.1.0,>3.0.1 in /usr/local/lib/python3.8/dist-packages (from ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (3.0.31) Requirement already satisfied: jedi>=0.16 in /usr/local/lib/python3.8/dist-packages (from ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (0.18.1) Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.8/dist-packages (from ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (0.1.6) Requirement already satisfied: decorator in /usr/local/lib/python3.8/dist-packages (from ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (5.1.1) Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.8/dist-packages (from ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (4.8.0) Requirement already satisfied: stack-data in /usr/local/lib/python3.8/dist-packages (from ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (0.5.0) Requirement already satisfied: pygments>=2.4.0 in /usr/local/lib/python3.8/dist-packages (from ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (2.13.0) Requirement already satisfied: backcall in /usr/local/lib/python3.8/dist-packages (from ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (0.2.0) Requirement already satisfied: pickleshare in /usr/local/lib/python3.8/dist-packages (from ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (0.7.5) Requirement already satisfied: locket in /usr/local/lib/python3.8/dist-packages (from partd>=0.3.10->dask[array,dataframe,delayed]>=2022.3.0->dataprep) (1.0.0) Requirement already satisfied: notebook>=4.4.1 in /usr/local/lib/python3.8/dist-packages (from widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (6.4.12) Requirement already satisfied: idna>=2.0 in /usr/local/lib/python3.8/dist-packages (from yarl<2.0,>=1.0->aiohttp<4.0,>=3.6->dataprep) (3.4) Requirement already satisfied: parso<0.9.0,>=0.8.0 in /usr/local/lib/python3.8/dist-packages (from jedi>=0.16->ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (0.8.3) Requirement already satisfied: jupyter-core>=4.6.1 in /usr/local/lib/python3.8/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (4.11.1) Requirement already satisfied: prometheus-client in /usr/local/lib/python3.8/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (0.14.1) Requirement already satisfied: Send2Trash>=1.8.0 in /usr/local/lib/python3.8/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (1.8.0) Requirement already satisfied: jupyter-client>=5.3.4 in /usr/local/lib/python3.8/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (7.3.4) Requirement already satisfied: ipykernel in /usr/local/lib/python3.8/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (6.15.3) Requirement already satisfied: argon2-cffi in /usr/local/lib/python3.8/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (21.3.0) Requirement already satisfied: nbformat in /usr/local/lib/python3.8/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (5.5.0) Requirement already satisfied: nbconvert>=5 in /usr/local/lib/python3.8/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (7.0.0) Requirement already satisfied: pyzmq>=17 in /usr/local/lib/python3.8/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (24.0.0) Requirement already satisfied: terminado>=0.8.3 in /usr/local/lib/python3.8/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (0.15.0) Requirement already satisfied: nest-asyncio>=1.5 in /usr/local/lib/python3.8/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (1.5.5) Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.8/dist-packages (from pexpect>4.3->ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (0.7.0) Requirement already satisfied: wcwidth in /usr/local/lib/python3.8/dist-packages (from prompt-toolkit<3.1.0,>3.0.1->ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (0.2.5) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=5.3.4->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (0.4) Requirement already satisfied: tinycss2 in /usr/local/lib/python3.8/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (1.1.1) Requirement already satisfied: bleach in /usr/local/lib/python3.8/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (5.0.1) Requirement already satisfied: mistune<3,>=2.0.3 in /usr/local/lib/python3.8/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (2.0.4) Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.8/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (4.11.1) Requirement already satisfied: lxml in /usr/local/lib/python3.8/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (4.9.1) Requirement already satisfied: nbclient>=0.5.0 in /usr/local/lib/python3.8/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (0.6.8) Requirement already satisfied: jupyterlab-pygments in /usr/local/lib/python3.8/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (0.2.2) Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.8/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (1.5.0) Requirement already satisfied: defusedxml in /usr/local/lib/python3.8/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (0.7.1) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (2.16.2) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (4.16.0) Requirement already satisfied: argon2-cffi-bindings in /usr/local/lib/python3.8/dist-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (21.2.0) Requirement already satisfied: psutil in /usr/local/lib/python3.8/dist-packages (from ipykernel->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (5.9.2) Requirement already satisfied: debugpy>=1.0 in /usr/local/lib/python3.8/dist-packages (from ipykernel->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (1.6.3) Requirement already satisfied: pkgutil-resolve-name>=1.3.10 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (1.3.10) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (0.18.1) Requirement already satisfied: importlib-resources>=1.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (5.9.0) Requirement already satisfied: cffi>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from argon2-cffi-bindings->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (1.15.1) Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.8/dist-packages (from beautifulsoup4->nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (2.3.2.post1) Requirement already satisfied: webencodings in /usr/local/lib/python3.8/dist-packages (from bleach->nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (0.5.1) Requirement already satisfied: pycparser in /usr/local/lib/python3.8/dist-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.6.9->ipywidgets<8.0,>=7.5->dataprep) (2.21) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv --- Logging error --- Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/pip/_internal/utils/logging.py", line 177, in emit self.console.print(renderable, overflow="ignore", crop=False, style=style) File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/rich/console.py", line 1673, in print extend(render(renderable, render_options)) File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/rich/console.py", line 1305, in render for render_output in iter_render: File "/usr/local/lib/python3.8/dist-packages/pip/_internal/utils/logging.py", line 134, in __rich_console__ for line in lines: File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/rich/segment.py", line 249, in split_lines for segment in segments: File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/rich/console.py", line 1283, in render renderable = rich_cast(renderable) File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/rich/protocol.py", line 36, in rich_cast renderable = cast_method() File "/usr/local/lib/python3.8/dist-packages/pip/_internal/self_outdated_check.py", line 130, in __rich__ pip_cmd = get_best_invocation_for_this_pip() File "/usr/local/lib/python3.8/dist-packages/pip/_internal/utils/entrypoints.py", line 58, in get_best_invocation_for_this_pip if found_executable and os.path.samefile( File "/usr/lib/python3.8/genericpath.py", line 101, in samefile s2 = os.stat(f2) FileNotFoundError: [Errno 2] No such file or directory: '/usr/bin/pip3.8' Call stack: File "/usr/local/bin/pip", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/pip/_internal/cli/main.py", line 70, in main return command.main(cmd_args) File "/usr/local/lib/python3.8/dist-packages/pip/_internal/cli/base_command.py", line 101, in main return self._main(args) File "/usr/local/lib/python3.8/dist-packages/pip/_internal/cli/base_command.py", line 223, in _main self.handle_pip_version_check(options) File "/usr/local/lib/python3.8/dist-packages/pip/_internal/cli/req_command.py", line 190, in handle_pip_version_check pip_self_version_check(session, options) File "/usr/local/lib/python3.8/dist-packages/pip/_internal/self_outdated_check.py", line 236, in pip_self_version_check logger.warning("[present-rich] %s", upgrade_prompt) File "/usr/lib/python3.8/logging/__init__.py", line 1458, in warning self._log(WARNING, msg, args, **kwargs) File "/usr/lib/python3.8/logging/__init__.py", line 1589, in _log self.handle(record) File "/usr/lib/python3.8/logging/__init__.py", line 1599, in handle self.callHandlers(record) File "/usr/lib/python3.8/logging/__init__.py", line 1661, in callHandlers hdlr.handle(record) File "/usr/lib/python3.8/logging/__init__.py", line 954, in handle self.emit(record) File "/usr/local/lib/python3.8/dist-packages/pip/_internal/utils/logging.py", line 179, in emit self.handleError(record) Message: '[present-rich] %s' Arguments: (UpgradePrompt(old='22.2.2', new='24.2'),)
This block imports all the key libraries required for the assignment:
import pandas as pd
import autosklearn
from dataprep.eda import create_report
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from pickle import dump, load
import lime
import lime.lime_tabular
import numpy as np
import shap
import warnings
warnings.filterwarnings('ignore')
print(autosklearn.__version__)
0.15.0
df = pd.read_csv("liver_disease_1.csv")
df.head()
| Age | Total_Bilirubin | Direct_Bilirubin | Alkaline_Phosphotase | Alamine_Aminotransferase | Aspartate_Aminotransferase | Total_Protiens | Albumin | Albumin_and_Globulin_Ratio | Dataset | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 65 | 0.7 | 0.1 | 187 | 16 | 18 | 6.8 | 3.3 | 0.90 | Yes |
| 1 | 62 | 10.9 | 5.5 | 699 | 64 | 100 | 7.5 | 3.2 | 0.74 | Yes |
| 2 | 62 | 7.3 | 4.1 | 490 | 60 | 68 | 7.0 | 3.3 | 0.89 | Yes |
| 3 | 58 | 1.0 | 0.4 | 182 | 14 | 20 | 6.8 | 3.4 | 1.00 | Yes |
| 4 | 72 | 3.9 | 2.0 | 195 | 27 | 59 | 7.3 | 2.4 | 0.40 | Yes |
report = create_report(df, title='Liver Disease Dataset EDA Report')
report.save('Liver_disease_dataset_eda_report.html')
report
0%| …
Report has been saved to Liver_disease_dataset_eda_report.html!
| Number of Variables | 10 |
|---|---|
| Number of Rows | 583 |
| Missing Cells | 0 |
| Missing Cells (%) | 0.0% |
| Duplicate Rows | 13 |
| Duplicate Rows (%) | 2.2% |
| Total Size in Memory | 75.1 KB |
| Average Row Size in Memory | 131.9 B |
| Variable Types |
|
| Total_Bilirubin is skewed | Skewed |
|---|---|
| Direct_Bilirubin is skewed | Skewed |
| Alkaline_Phosphotase is skewed | Skewed |
| Alamine_Aminotransferase is skewed | Skewed |
| Aspartate_Aminotransferase is skewed | Skewed |
| Albumin_and_Globulin_Ratio is skewed | Skewed |
| Dataset has 13 (2.23%) duplicate rows | Duplicates |
numerical
| Approximate Distinct Count | 72 |
|---|---|
| Approximate Unique (%) | 12.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 9328 |
| Mean | 44.7461 |
| Minimum | 4 |
| Maximum | 90 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 4 |
|---|---|
| 5-th Percentile | 18 |
| Q1 | 33 |
| Median | 45 |
| Q3 | 58 |
| 95-th Percentile | 72 |
| Maximum | 90 |
| Range | 86 |
| IQR | 25 |
| Mean | 44.7461 |
|---|---|
| Standard Deviation | 16.1898 |
| Variance | 262.1107 |
| Sum | 26087 |
| Skewness | -0.02931 |
| Kurtosis | -0.5655 |
| Coefficient of Variation | 0.3618 |
numerical
| Approximate Distinct Count | 113 |
|---|---|
| Approximate Unique (%) | 19.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 9328 |
| Mean | 3.2988 |
| Minimum | 0.4 |
| Maximum | 75 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 0.4 |
|---|---|
| 5-th Percentile | 0.6 |
| Q1 | 0.8 |
| Median | 1 |
| Q3 | 2.6 |
| 95-th Percentile | 16.35 |
| Maximum | 75 |
| Range | 74.6 |
| IQR | 1.8 |
| Mean | 3.2988 |
|---|---|
| Standard Deviation | 6.2095 |
| Variance | 38.5582 |
| Sum | 1923.2 |
| Skewness | 4.8948 |
| Kurtosis | 36.8356 |
| Coefficient of Variation | 1.8824 |
numerical
| Approximate Distinct Count | 80 |
|---|---|
| Approximate Unique (%) | 13.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 9328 |
| Mean | 1.4861 |
| Minimum | 0.1 |
| Maximum | 19.7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 0.1 |
|---|---|
| 5-th Percentile | 0.1 |
| Q1 | 0.2 |
| Median | 0.3 |
| Q3 | 1.3 |
| 95-th Percentile | 8.4 |
| Maximum | 19.7 |
| Range | 19.6 |
| IQR | 1.1 |
| Mean | 1.4861 |
|---|---|
| Standard Deviation | 2.8085 |
| Variance | 7.8877 |
| Sum | 866.4 |
| Skewness | 3.2041 |
| Kurtosis | 11.2451 |
| Coefficient of Variation | 1.8898 |
numerical
| Approximate Distinct Count | 263 |
|---|---|
| Approximate Unique (%) | 45.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 9328 |
| Mean | 290.5763 |
| Minimum | 63 |
| Maximum | 2110 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 63 |
|---|---|
| 5-th Percentile | 137 |
| Q1 | 175.5 |
| Median | 208 |
| Q3 | 298 |
| 95-th Percentile | 698.1 |
| Maximum | 2110 |
| Range | 2047 |
| IQR | 122.5 |
| Mean | 290.5763 |
|---|---|
| Standard Deviation | 242.938 |
| Variance | 59018.8666 |
| Sum | 169406 |
| Skewness | 3.7554 |
| Kurtosis | 17.5907 |
| Coefficient of Variation | 0.8361 |
numerical
| Approximate Distinct Count | 152 |
|---|---|
| Approximate Unique (%) | 26.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 9328 |
| Mean | 80.7136 |
| Minimum | 10 |
| Maximum | 2000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 10 |
|---|---|
| 5-th Percentile | 15 |
| Q1 | 23 |
| Median | 35 |
| Q3 | 60.5 |
| 95-th Percentile | 232 |
| Maximum | 2000 |
| Range | 1990 |
| IQR | 37.5 |
| Mean | 80.7136 |
|---|---|
| Standard Deviation | 182.6204 |
| Variance | 33350.1944 |
| Sum | 47056 |
| Skewness | 6.5323 |
| Kurtosis | 50.1364 |
| Coefficient of Variation | 2.2626 |
numerical
| Approximate Distinct Count | 177 |
|---|---|
| Approximate Unique (%) | 30.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 9328 |
| Mean | 109.9108 |
| Minimum | 10 |
| Maximum | 4929 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 10 |
|---|---|
| 5-th Percentile | 15.1 |
| Q1 | 25 |
| Median | 42 |
| Q3 | 87 |
| 95-th Percentile | 400.9 |
| Maximum | 4929 |
| Range | 4919 |
| IQR | 62 |
| Mean | 109.9108 |
|---|---|
| Standard Deviation | 288.9185 |
| Variance | 83473.9164 |
| Sum | 64078 |
| Skewness | 10.519 |
| Kurtosis | 149.6184 |
| Coefficient of Variation | 2.6287 |
numerical
| Approximate Distinct Count | 58 |
|---|---|
| Approximate Unique (%) | 9.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 9328 |
| Mean | 6.4832 |
| Minimum | 2.7 |
| Maximum | 9.6 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 2.7 |
|---|---|
| 5-th Percentile | 4.61 |
| Q1 | 5.8 |
| Median | 6.6 |
| Q3 | 7.2 |
| 95-th Percentile | 8.1 |
| Maximum | 9.6 |
| Range | 6.9 |
| IQR | 1.4 |
| Mean | 6.4832 |
|---|---|
| Standard Deviation | 1.0855 |
| Variance | 1.1782 |
| Sum | 3779.7 |
| Skewness | -0.2849 |
| Kurtosis | 0.2208 |
| Coefficient of Variation | 0.1674 |
numerical
| Approximate Distinct Count | 40 |
|---|---|
| Approximate Unique (%) | 6.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 9328 |
| Mean | 3.1419 |
| Minimum | 0.9 |
| Maximum | 5.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 0.9 |
|---|---|
| 5-th Percentile | 1.8 |
| Q1 | 2.6 |
| Median | 3.1 |
| Q3 | 3.8 |
| 95-th Percentile | 4.39 |
| Maximum | 5.5 |
| Range | 4.6 |
| IQR | 1.2 |
| Mean | 3.1419 |
|---|---|
| Standard Deviation | 0.7955 |
| Variance | 0.6329 |
| Sum | 1831.7 |
| Skewness | -0.04357 |
| Kurtosis | -0.3949 |
| Coefficient of Variation | 0.2532 |
numerical
| Approximate Distinct Count | 69 |
|---|---|
| Approximate Unique (%) | 11.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 9328 |
| Mean | 0.947 |
| Minimum | 0.3 |
| Maximum | 2.8 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 0.3 |
|---|---|
| 5-th Percentile | 0.5 |
| Q1 | 0.7 |
| Median | 0.93 |
| Q3 | 1.1 |
| 95-th Percentile | 1.5 |
| Maximum | 2.8 |
| Range | 2.5 |
| IQR | 0.4 |
| Mean | 0.947 |
|---|---|
| Standard Deviation | 0.3189 |
| Variance | 0.1017 |
| Sum | 552.13 |
| Skewness | 0.9896 |
| Kurtosis | 3.2583 |
| Coefficient of Variation | 0.3367 |
categorical
| Approximate Distinct Count | 2 |
|---|---|
| Approximate Unique (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory Size | 39477 |
| Mean | 2.7136 |
|---|---|
| Standard Deviation | 0.4525 |
| Median | 3 |
| Minimum | 2 |
| Maximum | 3 |
| 1st row | Yes |
|---|---|
| 2nd row | Yes |
| 3rd row | Yes |
| 4th row | Yes |
| 5th row | Yes |
| Count | 1582 |
|---|---|
| Lowercase Letter | 999 |
| Space Separator | 0 |
| Uppercase Letter | 583 |
| Dash Punctuation | 0 |
| Decimal Number | 0 |
This preprocessing step is crucial to ensure that the model is trained on balanced and standardized data, minimizing bias and improving generalization.
def random_oversample(X_train, y_train, target_column='target'):
"""
Perform random oversampling on the minority class to balance the dataset.
Parameters:
X_train (pd.DataFrame): Feature data.
y_train (pd.Series): Target labels.
target_column (str): The name of the target column to be used in the combined DataFrame. Default is 'target'.
Returns:
pd.DataFrame, pd.Series: Resampled X_train and y_train.
"""
# Combine X_train and y_train for resampling
X_train[target_column] = y_train
# Separate minority and majority classes
minority_class = X_train[X_train[target_column] == 1]
majority_class = X_train[X_train[target_column] == 0]
# Perform random over-sampling on the minority class
minority_oversampled = minority_class.sample(n=len(majority_class), replace=True, random_state=42)
# Combine the oversampled minority class with the majority class
oversampled_data = pd.concat([majority_class, minority_oversampled])
# Separate X_train and y_train again
X_train_resampled = oversampled_data.drop(target_column, axis=1)
y_train_resampled = oversampled_data[target_column]
return X_train_resampled, y_train_resampled
df["Albumin_and_Globulin_Ratio"].ffill(inplace=True)
test_ratio = 0.2
# Before we apply any feature engineering technique like upsampling or normalisation we need to split out dataset btw train and test to avoid data leakage.
X = df.iloc[:,:-1]
y = df.iloc[:,[-1]]
y = y["Dataset"].map({"Yes":1, "No":0}) # Yes means patient has liver disease.
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y , test_size=test_ratio , random_state=100)
# Fixing class imabalnce using SMOTE
X_train, y_train = random_oversample(X_train, y_train)
# lets do standard scalling of all other features so that we have all the features before we put into model.
scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(X_train)
X_train_final = pd.DataFrame(x_train_scaled, columns=X_train.columns)
dump(scaler, open('./scaler.pkl', 'wb'))
The goal of this task was to train multiple models, tune hyperparameters, and select the best-performing model for predicting liver disease. To automate the model selection and hyperparameter tuning process, we used AutoML via the Auto-sklearn library, which automates the entire workflow, including data preprocessing, model training, and hyperparameter optimization.
By using Auto-sklearn, we were able to automate the model training and hyperparameter tuning process, testing a wide range of algorithms and configurations within a limited time frame. The final ensemble model, consisting of RandomForest, AdaBoost, Extra Trees, and other classifiers, provides a well-balanced approach to predicting liver disease. The ability to automatically select and combine models ensures robust performance and minimizes overfitting.
import autosklearn.classification as auto_classifier
autoclassifier = auto_classifier.AutoSklearnClassifier(time_left_for_this_task=180,
per_run_time_limit=40)
autoclassifier.fit(X_train_final, y_train)
autoclassifier.show_models()
{10: {'model_id': 10,
'rank': 1,
'cost': 0.18181818181818177,
'ensemble_weight': 0.02,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7ffff08a2880>,
'balancing': Balancing(random_state=1, strategy='weighting'),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7fff8177f640>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7ffff0543d60>,
'sklearn_classifier': RandomForestClassifier(criterion='entropy', max_features=2, n_estimators=512,
n_jobs=1, random_state=1, warm_start=True)},
12: {'model_id': 12,
'rank': 2,
'cost': 0.21590909090909094,
'ensemble_weight': 0.02,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7fff9027bee0>,
'balancing': Balancing(random_state=1),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7fff88745430>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7fff88745fa0>,
'sklearn_classifier': AdaBoostClassifier(algorithm='SAMME',
base_estimator=DecisionTreeClassifier(max_depth=2),
learning_rate=0.13167493237005792, n_estimators=56,
random_state=1)},
16: {'model_id': 16,
'rank': 3,
'cost': 0.21590909090909094,
'ensemble_weight': 0.02,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7fff9022fee0>,
'balancing': Balancing(random_state=1, strategy='weighting'),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7fff8878a310>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7fff8878abb0>,
'sklearn_classifier': PassiveAggressiveClassifier(C=0.14833233294431605, average=True,
loss='squared_hinge', max_iter=16, random_state=1,
tol=0.00016482166646253793, warm_start=True)},
17: {'model_id': 17,
'rank': 4,
'cost': 0.21590909090909094,
'ensemble_weight': 0.02,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7fff8873d340>,
'balancing': Balancing(random_state=1, strategy='weighting'),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7fff9005d730>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7fff9005da00>,
'sklearn_classifier': AdaBoostClassifier(algorithm='SAMME',
base_estimator=DecisionTreeClassifier(max_depth=2),
learning_rate=0.03734246906377268, n_estimators=416,
random_state=1)},
21: {'model_id': 21,
'rank': 5,
'cost': 0.18181818181818177,
'ensemble_weight': 0.04,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7ffff075e040>,
'balancing': Balancing(random_state=1, strategy='weighting'),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7ffff0254be0>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7ffff0254dc0>,
'sklearn_classifier': ExtraTreesClassifier(criterion='entropy', max_features=44, min_samples_leaf=2,
min_samples_split=20, n_estimators=512, n_jobs=1,
random_state=1, warm_start=True)},
24: {'model_id': 24,
'rank': 6,
'cost': 0.13636363636363635,
'ensemble_weight': 0.1,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7fff902c6460>,
'balancing': Balancing(random_state=1),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7ffff017d610>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7ffff017d7f0>,
'sklearn_classifier': ExtraTreesClassifier(max_features=2, min_samples_split=4, n_estimators=512,
n_jobs=1, random_state=1, warm_start=True)},
29: {'model_id': 29,
'rank': 7,
'cost': 0.21590909090909094,
'ensemble_weight': 0.08,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7ffff02391c0>,
'balancing': Balancing(random_state=1, strategy='weighting'),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7fffb1a7cf40>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7ffff05fd0a0>,
'sklearn_classifier': MLPClassifier(alpha=0.0007119897774330087, beta_1=0.999, beta_2=0.9,
hidden_layer_sizes=(51, 51, 51),
learning_rate_init=0.00028079049815589414, max_iter=128,
n_iter_no_change=32, random_state=1, validation_fraction=0.0,
verbose=0, warm_start=True)},
34: {'model_id': 34,
'rank': 8,
'cost': 0.23863636363636365,
'ensemble_weight': 0.12,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7ffff01f1160>,
'balancing': Balancing(random_state=1, strategy='weighting'),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7fffb18bed30>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7fffb173db50>,
'sklearn_classifier': ExtraTreesClassifier(bootstrap=True, max_features=1, min_samples_leaf=14,
min_samples_split=14, n_estimators=512, n_jobs=1,
random_state=1, warm_start=True)},
46: {'model_id': 46,
'rank': 9,
'cost': 0.23863636363636365,
'ensemble_weight': 0.08,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7ffff01a6520>,
'balancing': Balancing(random_state=1),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7fffb175c370>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7fffb175c610>,
'sklearn_classifier': ExtraTreesClassifier(bootstrap=True, criterion='entropy', max_features=1,
min_samples_leaf=16, min_samples_split=5, n_estimators=512,
n_jobs=1, random_state=1, warm_start=True)},
58: {'model_id': 58,
'rank': 10,
'cost': 0.21590909090909094,
'ensemble_weight': 0.1,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7fffb1950c40>,
'balancing': Balancing(random_state=1, strategy='weighting'),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7fffb14682b0>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7fffb14685b0>,
'sklearn_classifier': ExtraTreesClassifier(criterion='entropy', max_features=6, min_samples_leaf=16,
min_samples_split=20, n_estimators=512, n_jobs=1,
random_state=1, warm_start=True)},
65: {'model_id': 65,
'rank': 11,
'cost': 0.1477272727272727,
'ensemble_weight': 0.08,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7fffb1887640>,
'balancing': Balancing(random_state=1),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7fffb12bd250>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7fffb12bd700>,
'sklearn_classifier': ExtraTreesClassifier(max_features=2, min_samples_split=7, n_estimators=512,
n_jobs=1, random_state=1, warm_start=True)},
66: {'model_id': 66,
'rank': 12,
'cost': 0.2272727272727273,
'ensemble_weight': 0.12,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7fffb172c850>,
'balancing': Balancing(random_state=1, strategy='weighting'),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7fffb10b77f0>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7fffb10b7be0>,
'sklearn_classifier': ExtraTreesClassifier(max_features=3, min_samples_leaf=20, min_samples_split=17,
n_estimators=512, n_jobs=1, random_state=1,
warm_start=True)},
74: {'model_id': 74,
'rank': 13,
'cost': 0.2272727272727273,
'ensemble_weight': 0.04,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7fffb13b9c70>,
'balancing': Balancing(random_state=1),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7fffb0ed3160>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7fffb0ed35b0>,
'sklearn_classifier': LinearDiscriminantAnalysis(shrinkage='auto', solver='lsqr',
tol=0.011632803126809681)},
75: {'model_id': 75,
'rank': 14,
'cost': 0.21590909090909094,
'ensemble_weight': 0.02,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7fffb129dd00>,
'balancing': Balancing(random_state=1),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7fffb0c833d0>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7fffb0c83640>,
'sklearn_classifier': PassiveAggressiveClassifier(C=0.01771591080165321, average=True, max_iter=16,
random_state=1, tol=9.18644810240989e-05,
warm_start=True)},
79: {'model_id': 79,
'rank': 15,
'cost': 0.15909090909090906,
'ensemble_weight': 0.02,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7fffb0fee1c0>,
'balancing': Balancing(random_state=1),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7fffb137a310>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7fffb137a160>,
'sklearn_classifier': ExtraTreesClassifier(bootstrap=True, criterion='entropy', max_features=2,
n_estimators=512, n_jobs=1, random_state=1,
warm_start=True)},
89: {'model_id': 89,
'rank': 16,
'cost': 0.13636363636363635,
'ensemble_weight': 0.02,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7fffb0d9c2b0>,
'balancing': Balancing(random_state=1),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7fff887ac670>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7fff887ac8e0>,
'sklearn_classifier': ExtraTreesClassifier(max_features=2, min_samples_split=8, n_estimators=512,
n_jobs=1, random_state=1, warm_start=True)},
92: {'model_id': 92,
'rank': 17,
'cost': 0.17045454545454541,
'ensemble_weight': 0.02,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7fffb0bd1760>,
'balancing': Balancing(random_state=1, strategy='weighting'),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7fffb08702b0>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7fffb0870e20>,
'sklearn_classifier': ExtraTreesClassifier(max_features=2, min_samples_split=18, n_estimators=512,
n_jobs=1, random_state=1, warm_start=True)},
93: {'model_id': 93,
'rank': 18,
'cost': 0.20454545454545459,
'ensemble_weight': 0.02,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7fffb13c9940>,
'balancing': Balancing(random_state=1, strategy='weighting'),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7fffb0613640>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7fffb0613a90>,
'sklearn_classifier': LinearDiscriminantAnalysis(shrinkage=0.6627083162415924, solver='lsqr',
tol=0.06631009386339572)},
94: {'model_id': 94,
'rank': 19,
'cost': 0.20454545454545459,
'ensemble_weight': 0.06,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice at 0x7ffff05273d0>,
'balancing': Balancing(random_state=1),
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice at 0x7fffb0488f70>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice at 0x7fffb04ae430>,
'sklearn_classifier': ExtraTreesClassifier(max_features=3, min_samples_leaf=2, min_samples_split=7,
n_estimators=512, n_jobs=1, random_state=1,
warm_start=True)}}
dump(autoclassifier, open('autoclassifier.pkl', 'wb'))
###################### PREDICTION USING AUTOCLASSIFIER ######################
from sklearn.metrics import accuracy_score
scaler = load(open('scaler.pkl', 'rb'))
X_test_scaled = scaler.transform(X_test)
X_test_final = pd.DataFrame(X_test_scaled, columns=X_test.columns)
model = load(open('autoclassifier.pkl', 'rb'))
prediction = model.predict(X_test_final)
score = accuracy_score(y_test, prediction)
print(f"Accuracy : {score}")
Accuracy : 0.6153846153846154
The SHAP (Shapley Additive Explanations) summary plot shows how the features in the liver disease dataset contribute to the model’s predictions for the entire test data. Each dot represents a SHAP value for a specific feature and patient, and the color of the dot (ranging from blue to red) represents the feature’s value (low to high). The SHAP values on the X-axis indicate whether a feature increases or decreases the likelihood of predicting liver disease.
This SHAP summary plot provides a comprehensive view of how each feature in the liver disease dataset impacts the model’s predictions for the test data. By analyzing the SHAP values and feature contributions, we can better understand the factors driving the model’s predictions.
# Initialize the SHAP explainer for the Auto-sklearn model
explainer = shap.Explainer(model.predict, X_test_final)
# Generate SHAP values for the test data
shap_values = explainer(X_test_final)
# Generate a SHAP summary plot
shap.summary_plot(shap_values, X_test_final, feature_names=X_test.columns)
ExactExplainer explainer: 118it [24:56, 12.79s/it]
The LIME (Local Interpretable Model-agnostic Explanations) output provides an explanation for an individual prediction made by the model regarding the presence or absence of liver disease. This explanation helps us understand which features contributed most to the model’s decision for this particular patient.
This LIME explanation offers clear insights into why the model predicted “No Liver Disease” for this patient by highlighting the relative importance of specific features.
# Create a LIME explainer
# Use the training data to initialize LIME explainer to capture feature distributions
explainer = lime.lime_tabular.LimeTabularExplainer(
X_train.values, # Training data
feature_names=X_train.columns, # Feature names
class_names=['No Liver Disease', 'Liver Disease'], # Class labels
mode='classification' # We are in classification mode
)
# Define a prediction function for LIME that directly uses scaled data
def predict_fn(data):
# Convert NumPy array back to DataFrame
data_df = pd.DataFrame(data, columns=X_test_final.columns) # Ensure correct column names
return model.predict_proba(data_df) # Predict probabilities
# Pick a specific instance from the scaled test data (e.g., first row)
sample_idx = 0 # Change this index to explain different samples
# Generate LIME explanation for this scaled instance
exp = explainer.explain_instance(X_test_final.iloc[sample_idx].values, predict_fn)
# Show the LIME explanation in the notebook
exp.show_in_notebook(show_all=False)
# Optionally, save the explanation as an HTML file
exp.save_to_file(f"lime_explanation_{sample_idx}.html")
Through this exercise using SHAP and LIME, we were able to gain clear insights into the decision-making process of the liver disease prediction model. Both XAI (Explainable AI) tools provided a deep understanding of how specific features contributed to the model’s output for individual predictions as well as the overall feature importance.
Interpretability is crucial in medical contexts like liver disease prediction because it ensures that predictions made by the model can be trusted and validated. Stakeholders such as doctors and healthcare professionals need to understand why a model predicted a certain outcome before making any decisions based on it. By using XAI tools like SHAP and LIME, we provide this necessary transparency, offering:
Both tools complement each other in helping us understand and trust the model, ensuring that the predictions align with real-world medical knowledge, and providing actionable insights to healthcare professionals.